RAID and NASD

 

 

NASD comments from reading:

-      bad explanation of architecture

-      file manager not well explained

 

RAID comments from reading

-      Failure assumptions may be limited (e.g. no correlated failures – what about batch failures?)

-      Caching not considered

 

 

RAID background

 

Problem: technology trends

-      computers getting larger, need more disk bandwidth

-      disk bandwidth not riding mooreÕs law

-      faster CPU enables more computation to support storage

-      data intensive applications

-       

 

Approaches:

 

-      SLED: single large expensive disk

-      RAID: redundant array of (independent, inexpensive) disks

 

NOTE:

-      Disk arrays had been done before

-      Contribution of this paper is a taxonomy and a way to compare them and organize them

 

Key ideas:

-      striping: write blocks of a file to multiple disks, can read/write in parallel

-      Redundancy: write extra data to extra disks for failure recovery. E.g. parity, ecc, duplicate data. Redundancy can improve performance – have choice of disk (latency), 2 disks (throughput)

 

 

Why arrays?

-      Cheaper disks

-      Lower power

-      Smaller enclosures

-      Higher reliability

o      Can survive a disk failure

-      Larger bandwidth

o      Can read or write multiple disks at a time

 

How do you compare disk setups?

-      Price?

-      Power?

-      Size?

-      Performance?

o      What performance?

o      Large reads

o      Small reads

o      Large writes

o      Small writes

o      Read / modify / write (TP)

 

 

Organization:

- take N disks, put into groups of G

 

RAID versions:

 

JBOD: just a bunch of disks, mount as separate volumes

-      Read / write performance for a file limited to single disk

-      Reliability for a byte is same as single disk, but file system can tolerate some disk failures with partial data loss

 

 

RAID 0: striping

-      Striping data across disks

-      Best overall performance: G reads/sec, G writes/sec

-      Worst reliability: MTTF = MTTF(disk) / G

 

RAID 1: mirroring

-      store all data on two disks

-      write to both disks

-      read from whichever disk is faster (better positioned)

-      Write performance = single disk

-      Read performance = double

-      Overhead is 100%

 

RAID 2: bit-wise ECC

-      stripe data across disks in small units

-      Store ECC bitwise on a parity disk

-      All reads / writes hit all disks

-      Can detect / correct lots of errors

-      Bad performance

-      FILL ME PERF

 

RAID 3: bit parity

-      rely on disk for error detection

-      Still read from all disks (but parity), write to all disks

-       

 

 

RAID 4: block parity

-      use single disk for error correction, rely on controllers for detection

-      Can read from a single disk (no need to compute ecc)

-      can write to two disks (data disk + update parity)

-      Bottleneck: single parity disk for all writes

-      Small writes require 4 accesses: read only block, old parity, write new block+ new parity

 

RAID 5: distributed parity

-      same as level 4 but parity disk changes for each block

-      Removes hotspot of parity disk

-      Large writes efficient – just one extra access for parity

 

RAID 6: more error correction

-      2 parity disks allows detection 2 disk failures

-       

 

Throughput per dollar

 

small read

small write

large read

large write

storage efficiency

Reason

raid 0

1

1

1

1

1

 

raid 1

1

½

1

½

½

extra disk

raid 3

1/G

1/G

(G-1)/G

(G-1)/G

(G-1)/G

one disk doesnÕt contribute

Raid 5

1

max(1/G, ¼)

1

(G-1)/G

(G-1)/G

 

 

 

 

 

 

 

 

 

Notes: Raid 2 inferior – like raid 3 but more ECC drives. Raid 4 inferior to Raid 5 – similar best case, but throughput limited by single parity disk

 

Choices of RAID

-      QUESTION: what should you choose, when?

-      Issues:

o      Cost of disks – is it relevant? Perhaps space/power more relevant

o      Workload: lots of small reads/writes indicates raid 1, lots of large reads and writes indicates 5

 

 

NASD

 

Technology trends:

-      need distributed file system

-      file server is bottleneck between client and data

-      QUESTION: how do you scale up a file system?

o      A: partition

¤       Still limited by disk ˆ server bandwidth

¤       Partitioning usually limited to certain areas, e.g. volumes, mount points

¤        

 

 

Approaches:

-      SAN: storage area networks

o      attach disks to network

o      Block level interface (read block, write block)

o      Cooperating file systems to make it work

o      Offers block-level management: backup, shadow, RAID

-      NAS: network attached storage

o      Richer interface to data: e.g. file systems, objects

o      Inherits SAN benefits if implemented on SAN

-      NASD: network attached disks

 

 

PROBLEM STATEMENT:

-      bandwidth to clients limited by need for a centralized file manager

o      QUESTION: Why?

o      FS semantics, consistency, naming

-      File system requires unnecessary copies

o      Off disk to network

o      network to memory

o      Off memory to network

o      network to client memory

 

 

ENABLING TECHNOLOGY:

-      I/O bound applications: multimedia, databases, data mining

-      New drive interfaces: they can be put on the net with iScsi

-      Smarter drivers – more opportunities for programming them

-      Storage networks / computer networks convering

-      Storage servers (e.g. nfs, afs) not cost effective: server cost is dominant cost unless many disks attached

 

NASD Idea:

-      separate metadata & management from data transfer

-      Provide security mechanism to allow disk right onto network, without interposed control

-      Principles:

o      Data transferred directly from disk to client, no through server

o      Asynchronous oversight: client can perform operations w/o synchronous access to manager. E.g. can read / write data without contacting manager. Policy info provided by manager as a capability, enforced by disk

o      Object based interface: not blocks or files, but variable-length objects. File manager can use them as whole files or stripes. Provides more semantics for disk – more information available

-      client talks to file manager to open files, creates directories, etc.

-      File manager returns a capability that allows client to access disk directly

 

NASD interface:

-      functions to access objects

-      Secured with capabilities (like Kerberos tickets)

o      Encrypted with disk key

o      Contains private session key

o      Client must prove it knows the session key with an authenticator

o      May contain policy for disk to enforce

o      Contains byte range for access (e.g. can limit to part of the object)

 

USING NASD

 

NFS:

-      files == objects

-      Lookup done on server, return capabilities

-      Attributes map onto object attributes or uninterpreted by disk but interpreted by client NFS library

 

AFS:

-      files == objects

-      Clients parse directories, must ask file manager for a capability to a file

-      Consistency model (invalidate callbacks on write) changes because writes not reported to manager; manager instead invalidates on open-for-write

-      Quotas handled by granting access to more data than current size (update after close)

 

NASD PFS

-      parallel file system by striping data across disks

-      New storage layer, Cheops, implements striping (RAID 0) but same object interface

o      Translates access for an object into many more capabilities that client can access

o      Stripes data in 512kb chunks